Welcome to the LiPD Quickstart Notebook!

This Notebook was created to help you get familiar with the commands you can use in LiPD. Follow each step and experiment as much as you'd like! Each step will help prepare you for the next step.

For this tutorial we will be using the example files found in the Github Repostiory's Examples folder.

Last Notebook Edit : 05.10.17

Table of Contents

Install Package

This guide assumes that you have already followed the installation steps and the LiPD package is installed on your computer.

Import Package

Import the LiPD package into your python environment.

</p> </div>


In [ ]:
import lipd

Quitting

Quitting the session is important. This will perform cleanup functions and reclaim the space of opened LiPD files on your computer

</p> </div>


In [ ]:
lipd.quit()

Reading Files

There are four valid file types that you may import. Use the appropriate function for the file type that you would like to read.

  • LiPD (.lpd)
  • Excel (.xls or .xlsx)
  • NOAA text (.txt)

In [ ]:
# Read File - GUI
lipd.readLipd()
lipd.readExcel()
lipd.readNoaa()

# Read File - with path argument - no GUI
lipd.readLipd("/path/to/file.lpd")
lipd.readExcel("/path/to/file.xls")
lipd.readNoaa("/path/to/file.txt")


# Read Directory - GUI
lipd.readLipds()
lipd.readExcels()
lipd.readNoaas()

# Read Directory - with path argument - no GUI
lipd.readLipds("/path/to/dir/")
lipd.readExcels("/path/to/dir/")
lipd.readNoaas("/path/to/dir/")

# Read Directory - all file types - GUI
lipd.readAll()

# Read Directory - all file types - no GUI
lipd.readAll("/path/to/dir/")

Excel Spreadsheet Converter


Microsoft Excel spreadsheets must be converted to LiPD before any LiPD functions can be used. Use the Excel template to create an Excel file with your data. Make sure to follow the formatting guidelines and the hints noted throughout the spreadsheet.


In [ ]:
lipd.excel()

NOAA Converter


National Oceanic and Atmospheric Administration (NOAA) text files must be converted to LiPD before any LiPD functions can be used. The converter is designed to parse data from the NOAA text template. Please insert your data in this template format to ensure a complete and accurate conversion to LiPD. Use the example file as a reference for correct formatting.

LPD to NOAA:
Converts all LiPD files into NOAA text files. Creates one NOAA text file for each data table found in the LiPD file. The LiPD file is updated to include the WDC URL that links to the corresponding NOAA dataset that it creates.

NOAA to LPD:
Converts NOAA text file into a LiPD file.

NOAA text template:
NOAA template

Example File:
khider2011b.txt


In [ ]:
# Run the function
lipd.noaa()

# Choose a conversion 
Which conversion?
1. LPD to NOAA
2. NOAA to LPD

DOI Updater


The DOI updater will take your LiPD files and update them with the most recent information provided by doi.org. The updater will run once per LiPD, and will skip any LiPD files that were updated previously.


In [ ]:
lipd.doi()

Writing Files


Save all datasets currently loaded to LiPD files.


In [ ]:
# Write Files - GUI
lipd.writeLipds()

# Write with path argument - No GUI
lipd.writeLipds("/path/to/dir/")

Pickling Data


The Pickle module is a python core module allows us to share LiPD data with python 2.7 users. It won't have the support or functions of LiPD Utilities, but it will give access to the data.

A Pickle file (.pklz) is a compressed archive file. It's small size makes sharing easy.


In [ ]:
import pickle
import gzip

# Read a pickle file
f = gzip.open('filename.pklz','rb')
newData = pickle.load(f)
f.close()


# Write a pickle file 
yourData = {'a':'blah','b':range(10)}
f = gzip.open('filename.pklz','wb')
pickle.dump(yourData,f)
f.close()

Other Functions


The functions below are not critical to the use of LiPD Utilities, but are included for convenience as helper functions that may make your workflow easier.

SHOW data


Show functions are useful for printing data to the console. Note: Some consoles will truncate extensive output. Printing large data to the console may not work well.

showLipds()

  • Show the names of the LiPD files in the current LiPD Library.

showMetadata(filename)

  • Show metadata for a specific dataset

showCsv(filename)

  • Show CSV data for a specific dataset

showDfs(dataframe_dictionary)

  • Show a list of dataframes in a dataframe dictionary

GET data


Get functions are useful for retrieving data and placing it in the workspace as a variable.

getCsv(filename)

  • Returns: dictionary
  • Get the values for the specific dataset.

getMetadata(filename)

  • Returns: dictionary
  • Get the metadata for the specific dataset.

In [ ]:
odp_csv = lipd.getCsv("ODP1098B12.lpd")

In [ ]:
odp_metadata = lipd.getMetadata("ODP1098B12.lpd")

Library Data


All LiPD Library data is stored in Python "objects" that allow us to manage and manipulate the data more easily. However, storing and pickling the data in this structure is not ideal for most situations. The JSON format is recommended as it is universal and easier to share.

getLibrary()

  • Returns: dictionary
  • Retrieves the LiPD Library as a Python dictionary.

In [ ]:
D = lipd.getLibrary()

Library Data (example)


Library data holds many datasets that are sorted by name. Below is an example of what the library looks like. There are three files in this library, and one file is expanded to show its contents.


In [ ]:
%%html
<img src="./d.png" />

TimeSeries


TimeSeries functions are useful creating, filtering, and exporting TimeSeries from the LiPD data in the workspace.

extractTs()

  • Returns: dictionary
  • Creates a time series from the data in the workspace.

collapseTs(time_series)

  • Returns: none
  • Puts time series data back into the workspace data.

find(expression, time_series)

  • Returns: List, all objects that matched the expression
  • Find all time series objects that match a certain criteria.

TimeSeries Object (example)


A time series holds many time series objects. Below is an example of what the contents of a time series object look like.


In [ ]:
%%html
<img src="./tso1.png" />
<img src="./tso2.png" />

In [ ]:
time_series = lipd.extractTs()

In [ ]:
new_time_series = lipd.find("archiveType is marine sediment", time_series)

In [ ]:
new_time_series = lipd.find("geo_meanElev <= -1000 && geo_meanElev > -1100", time_series)

In [ ]:
lipd.collapseTs(time_series)

Pandas Dataframes


ensToDf(arrays)

  • Returns: data frame (obj)
  • Create an ensemble data frame from some given nested numpy arrays

lipdToDf(filename)

  • Returns: data frame(s) (dictionary)
  • Creates a collection of pandas data frames from LiPD data

tsToDf(time_series, filename)

  • Returns: data frame(s) (dictionary)
  • Creates a collection of pandas data frames from a TimeSeries object. The CSV data frame will be plot with depth, age, and year columns when available.

In [ ]:
dfs_lipd = lipd.lipdToDf("ODP1098B12.lpd")

In [ ]:
lipd.showDfs(dfs_lipd)

In [ ]:
dfs_lipd["metadata"]

In [ ]:
dfs_lipd["paleoData"]["ODP1098B12.Paleo1.measurementTable1.csv"]

In [ ]:
dfs_lipd["chronData"]["ODP1098B12.Chron1.measurementTable1.csv"]

In [ ]:
dfs_ts = lipd.tsToDf(time_series, "ODP1098B12_data_SST")

In [ ]:
showDfs(dfs_ts)

In [ ]:
dfs_ts["metadata"]

In [ ]:
dfs_ts["paleoData"]

In [ ]:
dfs_ts["chronData"]["ODP1098B12"]

Removing LiPDs


removeLipd(filename)

  • Remove one dataset from the LiPD Library

removeLipds()

  • Remove all datasets from the LiPD Library

In [ ]:
lipd.removeLipds()

Glossary


DOI

A Digital Object Identifier is a unique alphanumeric string assigned by a registration agency to identify content and provide a persistent link to its location on the Internet. The publisher assigns a DOI when your article is published and made available electronically.

Environment / Workspace

The current state of the Notebook. Variables and modules in the Notebook are constantly changing, and all of these contribute to the state of the workspace.

LiPD

Refers to the LiPD package or LiPD files, depending on the context.

LiPD Library

A collection of LiPD file data.

Module

A set of related functions that is imported for use in the Notebook.

NOAA

National Oceanic and Atmospheric Administration. This document uses NOAA to signify a specific text file format used by the organization.

Notebook

Jupyter uses Notebooks as a way to save a single session of workflow and scientific computations. This Quickstart Notebook is for learning and documentation, though you may later create your own Notebooks with graphs, functions, and various datasets.

Magic Commands

Special built-in Jupyter commands that provide common useful functions that "magically" work.

TimeSeries Library

A collection of TimeSeries data and objects.

TimeSeries Object

An extracted piece of the TimeSeries from a LiPD file.